Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add e3sm_to_cmip task #650

Merged
merged 16 commits into from
Dec 18, 2024
Merged

Add e3sm_to_cmip task #650

merged 16 commits into from
Dec 18, 2024

Conversation

forsyth2
Copy link
Collaborator

@forsyth2 forsyth2 commented Dec 7, 2024

Issue resolution

Select one: This pull request is...

  • a bug fix: increment the patch version
  • a small improvement: increment the minor version
  • an incompatible (non-backwards compatible) API change: increment the major version

1. Does this do what we want it to do?

Objectives:

  • Remove e3sm_to_cmip functionality from the ts task and place it in its own task.
  • Update dependency handling and input chaining accordingly.
  • Improve maintainability by pulling out common code into boilerplate.bash.

Required:

  • Product Management: I have confirmed with the stakeholders that the objectives above are correct and complete.
  • Testing: I have added or modified at least one "min-case" configuration file to test this change. Every objective above is represented in at least one cfg.
  • Testing: I have considered likely and/or severe edge cases and have included them in testing.

If applicable:

  • Testing: this pull request introduces an important feature or bug fix that we must test often. I have updated the weekly-test configuration files, not just a "min-case" one.
  • Testing: this pull request adds at least one new possible parameter to the cfg. I have tested using this parameter with and without any other parameter that may interact with it.

2. Are the implementation details accurate & efficient?

Required:

  • Logic: I have visually inspected the entire pull request myself.
  • Logic: I have left GitHub comments highlighting important pieces of code logic. I have had these code blocks reviewed by at least one other team member.

If applicable:

  • Dependencies: This pull request introduces a new dependency. I have discussed this requirement with at least one other team member. The dependency is noted in zppy/conda, not just an import statement.
  • No new dependencies.

3. Is this well documented?

Required:

  • Documentation: by looking at the docs, a new user could easily understand the functionality introduced by this pull request.
    • No rst documentation has been added. The bulk of zppy's usage details are found in default.ini.

4. Is this code clean?

Required:

  • Readability: The code is as simple as possible and well-commented, such that a new team member could understand what's happening.
  • Pre-commit checks: All the pre-commits checks have passed.

If applicable:

  • Software architecture: I have discussed relevant trade-offs in design decisions with at least one other team member. It is unlikely that this pull request will increase tech debt.

@forsyth2 forsyth2 self-assigned this Dec 7, 2024
@forsyth2 forsyth2 mentioned this pull request Dec 7, 2024
@forsyth2 forsyth2 added the semver: incompatible API change Incompatible API change (will increment major version) label Dec 7, 2024
@forsyth2
Copy link
Collaborator Author

forsyth2 commented Dec 7, 2024

To do:

  • Check ts and e3sm_to_cmip have the right parameters defined in their default.ini sections and .py/.bash files.
  • Get the weekly tests running with separate e3sm_to_cmip tasks.
  • Add unit tests
  • Update all necessary integration test cfgs to have separate e3sm_to_cmip tasks. NOTE: The fact that this PR requires changes to the cfgs in order for them to work means it is a non-backwards-compatible change. And that means we'll need to increment the major version. I.e., the next release will be zppy v3.0.0!

@forsyth2 forsyth2 force-pushed the issue-649-e3sm2cmip-task branch from e3e48cc to 3a8dc06 Compare December 10, 2024 22:29
{% include cmip_metadata %}
EOF
{
export cmortables_dir={{ cmor_tables_prefix }}/cmip6-cmor-tables/Tables
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@forsyth2 I just found a bug in e3sm_to_cmip task when testing zppy (in e3sm-unified v1.10) on a Livermore machine . The cmortables_dir should be cmortables_dir={{ cmor_tables_prefix }}/e3sm_to_cmip_data/cmip6-cmor-tables/Tables. The tests on LCRC won't catch this problem, because somehow there is an excessive cmip6-cmor-tables directory under /lcrc/group/e3sm/diagnostics/ which does not exist in the centralized input data locawtion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed in e7d43e1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@forsyth2
Copy link
Collaborator Author

@chengzhuzhang I'm currently debugging this PR using the weekly test v3 cfg. I think the bulk of the work is done but there remains a lot of ironing out small issues (e.g., making sure dependencies are handled correctly, making sure the right parameters are passed along). My hope though is that we can actually get this PR included in time for the next Unified, meaning we'll jump straight from v2.5.0 to v3.0.0.

@forsyth2 forsyth2 mentioned this pull request Dec 12, 2024
3 tasks
Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengzhuzhang Ok I've run unit tests and the weekly tests. (The image check tests pass meaning no images changed, so that's good. The bash check tests do fail because obviously the bash differs now, so I will update expected results once this merges).

Extremely detailed testing/debugging steps
# Set up branch
cd ~/ez/zppy
git add -A

# Set up zppy-interfaces env
cd ~/ez/zppy-interfaces
git status
# Check no file changes will persist when we switch branches.
git fetch upstream main
git checkout -b test_zi_main_20241211 upstream/main
git log
# Last commit, from 11/26: Create manual test cases (#8)
conda clean --all --y
conda env create -f conda/dev.yml -n zi_main_20241211
conda activate zi_main_20241211
pip install .

# Set up e3sm_diags env
# Have to use pre-CDAT-migrated e3sm_diags
# Because https://github.com/E3SM-Project/zppy/pull/651 hasn't merged yet
cd ~/ez/e3sm_diags
git status
# Check no file changes will persist when we switch branches.
git fetch upstream
git checkout -b main_2.12.1 ca41b0e5d913610c88410928951f1ed11c75663f
git log
# Last commit, from 11/20: Release 2.12.1 (#893)
conda clean --all --y
conda env create -f conda-env/dev.yml -n e3sm_diags_2.12.1
conda activate e3sm_diags_2.12.1
pip install .

# Set up zppy env
cd ~/ez/zppy
conda clean --all --y
conda env create -f conda/dev.yml -n zppy_dev_pr650_e3sm_to_cmip
conda activate zppy_dev_pr650_e3sm_to_cmip
pip install .

# Unit tests for zppy-interfaces
cd ~/ez/zppy-interfaces
conda activate zi_main_20241211
pytest tests/unit/global_time_series/test_*.py
# 7 passed in 24.88s

# Unit tests for zppy
cd ~/ez/zppy
conda activate zppy_dev_pr650_e3sm_to_cmip
pytest tests/test_*.py
# 4 failed, 19 passed, 1 warning in 0.92s
# Made edits to test code only
pre-commit run --all-files
git add -A
pytest tests/test_*.py
# 23 passed in 0.08s

# Edit tests/integration/utils.py:
# UNIQUE_ID = "test_zppy_pr650_20241211"
# For get_chyrsalis_expansions:
#        "diags_environment_commands": "source /gpfs/fs1/home/ac.forsyth2/miniforge3/etc/profile.d/conda.sh; conda activate e3sm_diags_2.12.1",
#        "global_time_series_environment_commands": "source /gpfs/fs1/home/ac.forsyth2/miniforge3/etc/profile.d/conda.sh; conda activate zi_main_20241211",
# Keep this as is:     generate_cfgs(unified_testing=False, dry_run=False)

# Run zppy to produce actual results to compare in the integration tests
python tests/integration/utils.py
pip install .
git add -A
zppy -c tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test_zppy_pr650_20241211/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
# e3sm_to_cmip_land_monthly_1985-1986-0002 has an error
rm ilamb_1985-1988.status # Remove WAITING file

cd ~/ez/zppy
zppy -c tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test_zppy_pr650_20241211/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
# No errors, but ilamb has no results on https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test_zppy_pr650_20241211/v3.LR.historical_0051/ilamb/_1985-1988/

# Run integration tests
cd ~/ez/zppy
pytest tests/integration/test_weekly.py
# Passes
# Turns out ILAMB in expected results also doesn't show anything: https://web.lcrc.anl.gov/public/e3sm/zppy_test_resources/expected_comprehensive_v3/ilamb/_1985-1988/
# So the only issue introduced by this PR is that the e3sm_to_cmip land task had to get manually re-run instead of relying on automatic dependency handling.
# Added `dependFiles=dependencies,` to e3sm_to_cmip.py
# Edit tests/integration/utils.py:
# UNIQUE_ID = "test_zppy_pr650_20241211v2"
python tests/integration/utils.py
pip install .
git add -A
zppy -c tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_comprehensive_v2_chrysalis.cfg

That gives:

Traceback (most recent call last):
  File "/gpfs/fs1/home/ac.forsyth2/miniforge3/envs/zppy_dev_pr650_e3sm_to_cmip/bin/zppy", line 8, in <module>
    sys.exit(main())
  File "/gpfs/fs1/home/ac.forsyth2/miniforge3/envs/zppy_dev_pr650_e3sm_to_cmip/lib/python3.9/site-packages/zppy/__main__.py", line 61, in main
    _launch_scripts(config, script_dir, job_ids_file, plugins)
  File "/gpfs/fs1/home/ac.forsyth2/miniforge3/envs/zppy_dev_pr650_e3sm_to_cmip/lib/python3.9/site-packages/zppy/__main__.py", line 237, in _launch_scripts
    existing_bundles = e3sm_diags(config, script_dir, existing_bundles, job_ids_file)
  File "/gpfs/fs1/home/ac.forsyth2/miniforge3/envs/zppy_dev_pr650_e3sm_to_cmip/lib/python3.9/site-packages/zppy/e3sm_diags.py", line 87, in e3sm_diags
    submit_script(
  File "/gpfs/fs1/home/ac.forsyth2/miniforge3/envs/zppy_dev_pr650_e3sm_to_cmip/lib/python3.9/site-packages/zppy/utils.py", line 415, in submit_script
    raise DependencySkipError(skip_message)
zppy.utils.DependencySkipError: ...skipping because of dependency status file missing
   /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test_zppy_pr650_20241211v2/v2.LR.historical_0201/post/scripts/e3sm_to_cmip_atm_monthly_180x360_aave_1982-1983-0002.status

Need to update the other templates too...

# Edit tests/integration/utils.py:
# UNIQUE_ID = "test_zppy_pr650_20241211v3"
python tests/integration/utils.py
pip install .
git add -A
zppy -c tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_comprehensive_v2_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg

zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg

cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test_zppy_pr650_20241211v3/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
# e3sm_to_cmip_land_monthly_1985-1986-0002.status:ERROR (1)
# e3sm_to_cmip_land_monthly_1987-1988-0002.status:ERROR (1)
# ilamb_1985-1988.status:WAITING 644334

cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test_zppy_pr650_20241211v3/v2.LR.historical_0201/post/scripts
grep -v "OK" *status
# No errors

cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_bundles_output/test_zppy_pr650_20241211v3/v3.LR.historical_0051/post/scripts/
grep -v "OK" *status
# No errors

cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test_zppy_pr650_20241211v3/v3.LR.historical_0051/post/scripts
rm ilamb_1985-1988.status
cd ~/ez/zppy
# Added `dependFiles=dependencies,` to submitScript call in e3sm_to_cmip.py
# That's why dependencies weren't being included.
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test_zppy_pr650_20241211v3/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
# Now, there's no errors

# Run integration tests
cd ~/ez/zppy
# Comment out skip lines
pytest tests/integration/test_weekly.py

That gives:

===================================================================================================== short test summary info ======================================================================================================
FAILED tests/integration/test_weekly.py::test_bundles_bash_file_list - AssertionError: assert {'bundle1.bas...86.bash', ...} == {'bundle1.bas...86.bash', ...}
FAILED tests/integration/test_weekly.py::test_bundles_bash_file_content - assert 256 == 0
============================================================================================= 2 failed, 3 passed in 1217.72s (0:20:17) ============================================================================================

Got test_bundles_bash_file_list passing by updating expected files.

test_bundles_bash_file_content is just failing because the expected results need to be updated in zppy_test_resources after this PR merges.

Can you review this PR at least by visual inspection? (You can probably skip reviewing the tests/ directory). Thanks!

input_component = string(default=None)

[tc_analysis]
# NOTE: always overrides value in [default]
input_files = string(default="eam.h2")
# TODO for v3.0.0: Remove this parameter
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made notes of things we should also change for the v3 release, but that don't fit in with the changes here. #653 will track these things.

dependencies,
script_dir,
"e3sm_to_cmip",
ts_sub, # Relies on having the same subsection name as the corresponding ts subsection!
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengzhuzhang All the dependency handling is predicated on the e3sm_to_cmip subtasks sharing names with the corresponding ts subtasks. Is that an appropriate assumption?

STARTTIME=$(date +%s)
echo "RUNNING ${id}" > {{ prefix }}.status
{% include 'inclusions/slurm_header.bash' %}
{% include 'inclusions/boilerplate.bash' %}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Common code in the bash templates is now pulled out into one common file.

@forsyth2 forsyth2 marked this pull request as ready for review December 12, 2024 01:08
@chengzhuzhang
Copy link
Collaborator

Sounds good! Thank you, @forsyth2. I will review the codes tomorrow.

years = "1985:1995:5"

[[ atm_monthly_180x360_aave ]]
# for e3sm_diags (depend_on_ts), ilamb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e3sm_diags currently doesn't depend on e3sm_to_cmip, pmp will

Copy link
Collaborator

@chengzhuzhang chengzhuzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this implementation, I believe the [e3sm_to_cmip] configuration can be simplified as follows?

[e3sm_to_cmip]
active = True
walltime = "00:50:00"

years = "1985:1995:5"

[[ atm_monthly_180x360_aave ]]

[[ land_monthly ]]

One possible improvement to add some flexibility to configuration is to add variable lists for atm and land (e.g. cmip_var_atm, cmip_var_land), so that the user can specify only the variables needs to be convert to cmip format. The default values can be all e3sm_to_cmip supported variables. For this moment we can use:
cmip_var_atm = 'pr, tas, rsds, rlds, rsus'
cmip_var_land = 'mrsos, mrso, mrfso, mrros, mrro, prveg, evspsblveg, evspsblsoi, tran, tsl, lai, cLitter, cProduct, cSoilFast, cSoilMedium, cSoilSlow, fFire, fHarvest, cVeg, nbp, gpp, ra, rh'

{% endif -%}
{% if input_files.split(".")[0] == 'cam' or input_files.split(".")[0] == 'eam' -%}
--var-list \
'pr, tas, rsds, rlds, rsus' \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider to make the value of --var-list configurable.

${dest_cmip}/${tmp_dir} \
{% if input_files.split(".")[0] == 'clm2' or input_files.split(".")[0] == 'elm' -%}
--var-list \
'mrsos, mrso, mrfso, mrros, mrro, prveg, evspsblveg, evspsblsoi, tran, tsl, lai, cLitter, cProduct, cSoilFast, cSoilMedium, cSoilSlow, fFire, fHarvest, cVeg, nbp, gpp, ra, rh' \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider to make the value of --var-list configurable.

@forsyth2 forsyth2 force-pushed the issue-649-e3sm2cmip-task branch 2 times, most recently from 57094a3 to 382d6c5 Compare December 12, 2024 19:16
@forsyth2 forsyth2 force-pushed the issue-649-e3sm2cmip-task branch from 382d6c5 to 5812010 Compare December 12, 2024 19:18
@forsyth2
Copy link
Collaborator Author

@chengzhuzhang Thanks, 5812010 addresses your comments. Does that look good? If so, I'll run the test suite again to make sure nothing broke and then we can merge this.

@forsyth2
Copy link
Collaborator Author

@chengzhuzhang In testing, I ran into some issues with the new parameter deletions that highlighted a design decision we need to make re: how we handle ts as a dependency and access its output.

Method 1: The approach taken so far in this PR, for the e3sm_to_cmip task

Handling the ts dependency

In e3sm_to_cmip.py:

            add_dependencies(
                dependencies,
                script_dir,
                "ts",
                sub,  # Relies on having the same subsection name as the corresponding ts subsection!
                c["yr_start"],
                c["yr_end"],
                c["ts_num_years"],
            )

This is why it's important to set ts_num_years in the cfg.

Accessing ts output

In templates/e3sm_to_cmip.bash, dest tells us where to look for ts output.

dest={{ output }}/post/{{ component }}/{{ grid }}/ts/{{ frequency }}/{{ '%dyr' % (ypf) }}

output and frequency are just defined in the cfg. The others are defined in e3sm_to_cmip.py via:

component =>

        set_component_and_prc_typ(c)

grid =>

        set_mapping_file(c)
        set_grid(c)

Note that this means we need to have mapping_file defined for the e3sm_to_cmip task.

ypf =>

            c["yr_start"] = s[0]
            c["yr_end"] = s[1]
            if ("last_year" in c.keys()) and (c["yr_end"] > c["last_year"]):
                continue  # Skip this year set
            c["ypf"] = s[1] - s[0] + 1

Method 2: The approach in the ILAMB task

4 parameters are defined in default.ini:

ts_atm_subsection = string(default="")
ts_land_subsection = string(default="")
# Name of the grid used by the relevant `[ts]` `atm` task
ts_atm_grid = string(default="180x360_aave")
# Name of the grid used by the relevant `[ts]` `land` task
ts_land_grid = string(default="180x360_aave")

Handling the ts dependency

In ilamb.py, this code block handles the ts land subtask dependency (there is an equivalent one for atm too):

    define_or_guess2(
        c, "ts_land_subsection", "land_monthly", ParameterGuessType.SECTION_GUESS
    )
    add_dependencies(
        dependencies,
        script_dir,
        "ts",
        c["ts_land_subsection"],
        c["year1"],
        c["year2"],
        c["ts_num_years"],
    )
    add_dependencies(
        dependencies,
        script_dir,
        "e3sm_to_cmip",
        c[
            "ts_land_subsection"
        ],  # Relies on having the same subsection name as the corresponding ts subsection!
        c["year1"],
        c["year2"],
        c["ts_num_years"],
    )
  • If ts_land_subsection isn't provided, we guess that it's named "land_monthly".
  • This is the code block from this PR, so you can see it assumes e3sm_to_cmip and ts will have the same subtask names. Do we want that? If not, we'd really need 2 parameters --ts_land_subsection and e3sm_to_cmip_land_subsection.

Accessing ts output

In ilamb.bash, we can see the ts_land_grid and ts_atm_grid parameters are used to find the ts output (as opposed to computing the grid ourselves as in method 1

lnd_ts_for_ilamb={{ output }}/post/lnd/{{ ts_land_grid }}/cmip_ts/monthly/
atm_ts_for_ilamb={{ output }}/post/atm/{{ ts_atm_grid }}/cmip_ts/monthly/
cp -s ${lnd_ts_for_ilamb}/*_*_*_*_*_*_${Y1}??-${Y2}??.nc .
cp -s ${atm_ts_for_ilamb}/*_*_*_*_*_*_${Y1}??-${Y2}??.nc .

Summary table

Summary of current implementation:

e3sm_to_cmip ilamb
Dependency handling assume the ts dependency has the same name use ts_{component}_subsection, assume e3sm_to_cmip will share a subsection name with its corresponding ts dependency
ts output access re-compute the grid and component in the subtask use ts_{component}_grid

Options

  1. We set up the ILAMB task like the e3sm_to_cmip task => both use method 1 (i.e., auto-determine what the ts task must have been named)
  2. We set up the e3sm_to_cmip task like ILAMB is set up => both use method 2 (i.e., have component-by-component parameters to point to dependencies and output.
  3. Leave them each as-is; they should take different approaches.

What is the use case for e3sm_to_cmip? E.g., ILAMB is pretty clearly just for land (and sometimes atmosphere). If e3sm_to_cmip should be available for any of the components, Method 2 could get unwieldy, adding quite a few parameters. On the other hand, Method 2 allows for flexibility in naming subsections.

@chengzhuzhang
Copy link
Collaborator

I think it is a good idea to have some consistency in parameters. I feel option two is more appealing in this case, because in option 1, to auto determine data path, it requires more parameters, and it is not very intuitive to supply mapping file in this case for deduce the path.

e3sm_to_cmip is also needed by pmp task, in addition to ilamb. But I don't think more parameters needs to be defined in the new use case. And most likely pmp and ilamb will use the same set of e3sm_to_cmip created data.

@forsyth2
Copy link
Collaborator Author

@chengzhuzhang Ok I did some parameter shuffling and it appears to be working now. Changes are in 6e37e19

Notably:

  • e3sm_to_cmip actually has separate jobs per component, unlike ILAMB which includes atm and land data in a single job; that means e3sm_to_cmip doesn't need specific per-component parameters like ILAMB. That means we just need to have ts_grid (which defaults to "180x360_aave"), ts_subsection (which defaults to the name of the e3sm_to_cmip task), and cmip_vars (which defaults to different variable lists based on component)
  • ILAMB gets two new parameters -- to specify the e3sm_to_cmip atm and land tasks, which also have defaults set.

I'm going to do final testing again; hoping everything works nicely now.

@forsyth2
Copy link
Collaborator Author

@chengzhuzhang I've had trouble getting nodes for testing today. I was running into some file path issues that I need to debug more. Nevertheless, I'm hoping we should be able to merge this early next week.

@forsyth2
Copy link
Collaborator Author

@chengzhuzhang It's also been challenging to get nodes today. I'm still waiting for Bundles to run, but I did get the v3 image check test to pass. The v2 test failed though.

Details

Specifically, the lines I've been having trouble with are in ilamb.bash:

cp -s ${lnd_ts_for_ilamb}/*_*_*_*_*_*_${Y1}??-${Y2}??.nc .
cp -s ${atm_ts_for_ilamb}/*_*_*_*_*_*_${Y1}??-${Y2}??.nc .

I found that the v2 test was asking for data in different increments.

  • v2 test: ts, e3sm_to_cmip, ilamb: years = "1980:1984:2",
  • v3 test: ts, e3sm_to_cmip: years = "1985:1989:2",, ilamb: years = "1985:1989:4"

The v3 cfg didn't actually have any expected output from ILAMB because of this. It was looking for files matching 1985 -> 1988 but only seeing ts files of the form 1985 -> 1986, 1987 -> 1988.

This was apparently not discovered earlier because there was no error check in ilamb.bash for these cp lines. They now have error checks.

This issue was fixed for dependencies in #603. There, the add_dependencies function was added to split up the years into desired increment. E.g., if we're trying to run ILAMB for 1985-1988 on 1985-1986 and 1987-1988 data, then we actually do have all the data we need.

However, what was missed was that these year ranges also appeared in ilamb.bash.

The add_dependencies function is a bit involved for bash, so I figured I could just copy all the data.

So in my latest run, those two lines had become:

cp -s ${lnd_ts_for_ilamb}/*.nc .
if [ $? != 0 ]; then
  cd {{ scriptDir }}
  echo 'ERROR (1)' > {{ prefix }}.status
  exit 1
fi
cp -s ${atm_ts_for_ilamb}/*.nc .
if [ $? != 0 ]; then
  cd {{ scriptDir }}
  echo 'ERROR (2)' > {{ prefix }}.status
  exit 2
fi

This actually allowed the v3 cfg to generate ILAMB output. (Adding actual output doesn't fail the test because the test is only looking for missing or mismatched output). Unfortunately, in the process it broke the v2 cfg, which did want separated data.

Looking at the v2 image diffs, there were only slight variations, until I came across actual and expected. You can see that actual ranges 2000-2004 while expected ranges 2000-2002. (Side note: I have no idea where 2000 came from; shouldn't it have 1985 as the start year?)

So that means we really can't just copy over everything, because ILAMB is now seeing 4 years of data when we only want to process 2.

Potential paths forward:

  1. Implement similar logic to add_dependencies in ilamb.bash.
  2. Construct the paths lnd_ts_for_ilamb and atm_ts_for_ilamb in ilamb.py, where we can work entirely in Python.

I think option 2 is better for code maintainability (I strongly believe we should be limiting the complexity of the bash code), so I'll proceed with that.

Timeline

I think I can still get this merged this week, assuming further issues don't pop up. My main problem at the moment is getting compute partition time for comprehensive testing (but debug at least works for quick tests).

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Dec 17, 2024

@chengzhuzhang Well actually now I'm not sure what's going on. I created one "min case" quick test cfg to do 30 years of ILAMB on 2 sets of 15-year ts/e3sm_to_cmip and one to do 30 years of ILAMB on 1 set of 30-year ts/e3sm_to_cmip. This would be easier to debug after originally running into the issue on the comprehensive test.

However both appear to produce the same results, showing 2000-2017 on a plot at the bottom of the page: 1 30-year set and 2 15-year sets

(Again, I don't understand why the years on that time series never seem to have any relation to the input data years...)

I'll have to debug this further.

@chengzhuzhang
Copy link
Collaborator

chengzhuzhang commented Dec 17, 2024

@forsyth2 Maybe we should consider to have the same configuration for years. I also have some question, should years for ilamb match how years are used for ts and e3sm_to_cmip? Or years means differently for ilamb? For instance, should we support both
"1980:1984:2" and "1980:1984:4" as "years" for ilamb? with the former have 2 sets of ilamb runs, latter for 1 set?

The question with why 2000 is showing up here, this has been noted here. The idea is to align simulation with obs years. This may not be ideal, but a solution recommended by ilamb team at the time ilamb is being integrated.

@forsyth2
Copy link
Collaborator Author

@chengzhuzhang

Maybe we should consider to have the same configuration for years.

Well that would negate the whole point of #603 then. That PR resolved #599, which described issue of wanting a 30-year ILAMB run to use 6 5-year ts data sets, rather than 1 30-year ts data set.

should years for ilamb match how years are used for ts and e3sm_to_cmip? For instance, should we support both "1980:1984:2" and "1980:1984:4" as "years" for ilamb?

Yes, that is how it is set up. "1980:1984:2" would create 2 ILAMB jobs: 1980-1981, 1982-1983. "1980:1984:4" would create 1 ILAMB job: 1980-1983.

ts_num_years is used to tell us what increments the ts task is in. That parameter is actually also used in e3sm_diags.bash for the same purpose:

    # Go through the time series files for between year1 and year2, using a step size equal to the number of years per time series file
    for year in `seq ${begin_year} {{ ts_num_years }} ${end_year}`;
    do
      YYYY=`printf "%04d" ${year}`
      for file in ${ts_dir_source}/${v}_${YYYY}*.nc
      do
        # Add this time series file to the list of files for cdscan to use
        echo ${file} >> ${v}_files.txt
      done
    done

So, maybe I should just do the same for ILAMB for consitency, though in the future I would like to reduce the computation done in bash (in favor of doing it in python).

The question with why 2000 is showing up here, this has been noted

Oh I see, thanks!

@chengzhuzhang
Copy link
Collaborator

Thanks for clarifying, @forsyth2.

Maybe we should consider to have the same configuration for years.

Well that would negate the whole point of #603 then. That PR resolved #599, which described issue of wanting a 30-year ILAMB run to use 6 5-year ts data sets, rather than 1 30-year ts data set.

#603 to resolve #599 is good. I should have clarified my question. I mean we should consider same configuration for "years" in v2 vs v3 configuration files. For the two sets of configuration files for different data, we only need a minimized sets of parameters that are data specific, such as paths, mapping files, if tc_analysis is on. Eventually, we can reduce the number of .cfg files to make testing more manageable.

@forsyth2
Copy link
Collaborator Author

I mean we should consider same configuration for "years" in v2 vs v3 configuration files.

Oh, no I actually prefer they have different configurations -- it provides a better cross-section of parameter combinations to test. For instance, this bug wouldn't have been caught if they had the same configurations.

As for this bug, it appears my latest commit 0bd5d90 fixes it. I just need to re-run the full test on the latest code now to make sure no other issues pop up.

@forsyth2
Copy link
Collaborator Author

I have reviewed this PR by visual inspection. I checked that all integration tests either pass or have expected failures; I have updated expected results accordingly. This is ready to merge.

For reference: detailed testing steps 2024-12-17/18
# Set up zppy-interfaces env
# zi_main_20241211 already set up

# Set up e3sm_diags env
# e3sm_diags_2.12.1 already set up
# Have to use pre-CDAT-migrated e3sm_diags
# Because https://github.com/E3SM-Project/zppy/pull/651 hasn't merged yet

# Set up zppy env
# zppy_dev_pr650_e3sm_to_cmip already set up, just need to include updated code
cd ~/ez/zppy
conda activate zppy_dev_pr650_e3sm_to_cmip
pre-commit run --all-files
pip install .

# Unit tests for zppy
pytest tests/test_*.py
# 23 passed in 0.33s

# Edit tests/integration/utils.py:
# UNIQUE_ID = "test_zppy_pr650_20241217"
# For get_chyrsalis_expansions:
#        "diags_environment_commands": "source /gpfs/fs1/home/ac.forsyth2/miniforge3/etc/profile.d/conda.sh; conda activate e3sm_diags_2.12.1",
#        "global_time_series_environment_commands": "source /gpfs/fs1/home/ac.forsyth2/miniforge3/etc/profile.d/conda.sh; conda activate zi_main_20241211",
# Keep this as is:     generate_cfgs(unified_testing=False, dry_run=False)

# Run zppy to produce actual results to compare in the integration tests
python tests/integration/utils.py
pip install .
git add -A

zppy -c tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_comprehensive_v2_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg

zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg

cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test_zppy_pr650_20241217/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
# No errors
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test_zppy_pr650_20241217/v2.LR.historical_0201/post/scripts
grep -v "OK" *status
# No errors
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_bundles_output/test_zppy_pr650_20241217/v3.LR.historical_0051/post/scripts/
grep -v "OK" *status
# No errors

cd ~/ez/zppy
pytest tests/integration/test_weekly.py
# 1 failed, 4 passed in 1295.46s (0:21:35)
# test_bundles_bash_file_content failed, 
# which we expected because of the e3sm_to_cmip task addition

pytest tests/integration/test_bash_generation.py # 1 failed in 2.23s
pytest tests/integration/test_campaign.py # 6 failed in 3.86s
pytest tests/integration/test_defaults.py # 1 failed in 0.62s
pytest tests/integration/test_last_year.py # 1 passed in 0.30s
# The 8 test failures are expected because of the changes in this PR.

# We're running off the latest `main` + changes in this PR,
# so we can go ahead and update the expected results now.

cd /lcrc/group/e3sm/public_html
cp -r zppy_test_resources zppy_test_resources_previous/expected_results_until_20241218

cd ~/ez/zppy
chmod u+x tests/integration/generated/update_*_expected_files_chrysalis.sh
./tests/integration/generated/update_bash_generation_expected_files_chrysalis.sh
# 1 passed in 1.55s
./tests/integration/generated/update_campaign_expected_files_chrysalis.sh
# 6 passed in 2.15s
./tests/integration/generated/update_defaults_expected_files_chrysalis.sh
# 1 passed in 0.82s
./tests/integration/generated/update_weekly_expected_files_chrysalis.sh
# 5 passed in 1300.51s (0:21:40)

@forsyth2 forsyth2 merged commit 2891d44 into main Dec 18, 2024
4 checks passed
@forsyth2 forsyth2 deleted the issue-649-e3sm2cmip-task branch December 18, 2024 20:01
@forsyth2 forsyth2 mentioned this pull request Dec 18, 2024
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver: incompatible API change Incompatible API change (will increment major version)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement]: Separate e3sm_to_cmip task from ts Make e3sm_to_cmip its own task
2 participants